Genome‐wide meta‐analysis and fine‐mapping prioritize potential causal variants and genes related to leprosy

Abstract To date, genome‐wide association studies (GWASs) have discovered 35 susceptible loci of leprosy; however, the cumulative effects of these loci can only partially explain the overall risk of leprosy, and the causal variants and genes within these loci remain unknown. Here, we conducted out new GWASs in two independent cohorts of 5007 cases and 4579 controls and then a meta‐analysis in these newly generated and multiple previously published (2277 cases and 3159 controls) datasets were performed. Three novel and 15 previously reported risk loci were identified from these datasets, increasing the known leprosy risk loci of explained genetic heritability from 23.0 to 38.5%. A comprehensive fine‐mapping analysis was conducted, and 19 causal variants and 14 causal genes were identified. Specifically, manual checking of epigenomic information from the Epimap database revealed that the causal variants were mainly located within the immune‐relevant or immune‐specific regulatory elements. Furthermore, by using gene‐set, tissue, and cell‐type enrichment analyses, we highlighted the key roles of immune‐related tissues and cells and implicated the PD‐1 signaling pathways in the pathogenetic mechanism of leprosy. Collectively, our study identified candidate causal variants and elucidated the potential regulatory and coding mechanisms for genes associated with leprosy.


INTRODUCTION
Infectious diseases remain one of the major challenges to human health and survival for centuries, and they are among the leading causes of death and disability worldwide. 1,2The occurrence and progression of infectious diseases depend on host exposure to pathogens and the invasiveness of pathogens.Individuals exposed to the same pathogen, however, show diverse clinical manifestations, thus indicating the critical role of host genetic factors in disease outcomes. 3,4In the past few decades, a series of susceptibility loci of infectious diseases have been found through candidate gene analysis or genome-wide association studies (GWASs), expanding our understanding of the genetic basis of infectious diseases. 5eprosy, a chronic granulomatous infectious disease caused by Mycobacterium leprae, primarily affects the skin and peripheral nerves and leads to irreversible disability and deformities. 3Given that only a small number of individuals exposed to M. leprae are successfully infected and that clinical disease outcomes show marked interindividual variations, leprosy is an ideal model for genetic studies. 3][8][9][10][11][12][13][14] However, all of the identified susceptibility genes/loci can explain only 23.0% of the overall risk of leprosy. 12Given an estimated heritability of up to 57%, 15 more genetic susceptibility loci remain to be revealed.
GWAS loci often harbor many variants and genes, and most of the significantly associated variants are located in the nonprotein-coding regions of the genome. 16][10][11][12][13][14]17,18 Thus, post-GWASs mainly focus on finemapping research, which could enable to identify causal genes or variants and establish a foundation for further functional exploration and interpretation of the pathogenesis of diseases. 19,20Large-scale genetic fine-mapping analyses have been conducted for infectious and autoimmune diseases. 16,19,21,22For instance, putative causal genes identified by expression quantitative trait loci (eQTL) colocalization display the importance of hair follicles in susceptibility to acne 23 ; causal variants identified by multiomic epigenetic annotation are implicated in oligodendrocyteand neuron-specific functions in Alzheimer's disease and Parkinson's disease 24 ; and tissue-or cell-type enrichment analysis enables to choose the relevant tissue or cell type and to further elucidate the molecular mechanisms associated with significant loci. 25,26These fine-mapping approaches demonstrated the complexities of interpreting functional noncoding single nucleotide polymorphisms (SNPs), thus contributing to the identification of new therapeutic targets.
To further characterize the genetic architecture of leprosy, we conducted a meta-analysis of GWASs involving 7284 cases and 7738 controls and further performed fine-mapping and functional analyses of all the 18 identified loci, except the major histocompatibility complex (MHC) region.Our study revealed 19 putative causal genetic variants and 14 candidate disease genes within nine leprosy loci.Gene-set analysis and tissue and cell specificity analysis showed the important role of the synergistic antimicrobial responses between phagocytes and T cells through multiple immune-related pathways in leprosy.

Meta-analysis identified three new leprosy-associated loci
In the present study, two new GWASs (GWAS5 and GWAS6) were subjected to genotyping by a populationoptimized array (Illumina Asian Screening Array); GWAS5 included 3298 leprosy cases and 3272 healthy controls from northern China, while GWAS6 included 2302 leprosy cases and 1632 healthy controls from southern China.Principal component (PC) analysis confirmed that all the cases and controls in both GWAS5 and GWAS6 were well matched (Figure S1).Finally, 474,857 SNPs in 6101 samples (2991 cases and 3110 controls) of GWAS5 and 477,111 SNPs in 3485 samples (2016 cases and 1469 controls) of GWAS6 passed the SNP and sample quality control (QC) filters.Demographic information and more details of these studies are provided in the Methods section and Table S1.
The two new GWASs and four previously published GWASs (GWAS1−4) 6,9,11 were then imputed separately by using the Chinese population-specific reference panel (China Metabolic Analytics Project, ChinaMAP), which contains 10,155 unrelated Chinese individuals and has been reported to achieve high-quality imputation in Chinese populations. 27After performing stringent QC, 5,276,914 shared SNPs in 7284 cases and 7738 controls remained to be analyzed by meta-analysis.The quantilequantile (Q-Q) plots and genomic inflation factors for individual studies (λ GC = 1.02-1.07)or the meta-analysis (λ GC = 1.09, λ GC1000 = 1.01) did not show evidence of a substantial inflation rate (Figure S2).An overview of the analysis strategy is presented in Figure S3.
We observed genome-wide significant (p < 5 × 10 −8 ) associations at 18 loci (Figures S4 and S5 and Table 1), including 15 previously established loci 6,7,9,10,13 and three novel leprosy susceptibility loci located near the METTL1, MYL2, and CSK genes.Regarding the other known susceptibility loci, 12 susceptibility loci showed consistent associations at nominal significance (p < 0.05, same effect direction); however, the SNP rs4833095 in TLR1 reported in an Indian population 14 did not show any association (p = 0.74).The other three SNPs located in CARD9, NCKIPSD, and HIF1A and five SNPs located in SLC7A2, TYK2, GIT2, ALDH2, and IL23R did not pass the QC because of a low minor allele frequency (MAF) and low imputation quality score, respectively (Table S2 and Figure S4).
Next, stepwise conditioning was used to identify independent signals at the 17 genome-wide significant loci except the MHC region.Only three loci at EGR2, TNFSF15, and LACC1 showed a second independent signal (Table S3 and Figures S5A-C).
We also estimated the heritability of leprosy by using all 38 significant loci (35 known and three novel susceptibility loci) and all the common SNPs across the genome.Assuming a population prevalence of 0.0001 for leprosy, the risk loci explained 6.71% of the variance in leprosy liability, while common genetic variation across the genome explained 17.43% (s.e.= 0.01) of the variance in leprosy liability.These results indicate that all the known leprosy risk loci explain 38.50% (6.71%/17.43%) of the overall genetic liability for leprosy.

Fine-mapping analysis identified credibly causal variants and genes
To identify candidate causal variants and/or genes underlying 20 independent associations in the 17 loci outside the MHC region, we performed comprehensive fine-mapping, colocalization, and network analyses (Figure S3).First, we limited the number of potential causal variants in all the 20 independent associations to 350 SNPs (95% credible set [CS]) by the susieR package in R software, which can implement methods for variable selection in linear regression based on the "Sum of Single Effects" (SuSiE) model.We then annotated all the 350 variants based on their potential deleterious effect on protein function or interference in gene expression regulation.We found eight protein-damaging variants in four locus with a combined annotation-dependent depletion (CADD) score > 15 (Figure 1A) or SpliceAI score > 0.8 (Figure 1B).We also identified another 11 regulatory variants in six locus through manual checking of chromatin interaction information from the Epigenome Browser and the HiCHIPdb database (Figure 1C) or a colocalization analysis by using eQTL datasets of skin, nerve, and whole blood (Figures 1D-F).All these 19 causal variants were located in 12 causal genes, including ADGB, TNFSF15, ADO, SLC29A3, METTL1, METTL21B, TSFM, CSK, IL27, SULT1A2, SNX20, and NOD2.In addition, IL18RAP and CYP27B1 were pinpointed as causal genes through colocalized the leprosy-associated signals with eQTL signals of skin and nerve (Figures 1E and F).Taken together, eight protein-damaging variants and 11 regulatory variants were implicated as causal variants, and 14 genes were identified as causal genes.Figure 1 shows the results of the identified credibly causal variants and genes.
For the novel leprosy risk locus located on chr12:57,665,085-58,665,085, the 95% CS comprised 35 variants, including one protein-relevant variant and two regulatory-relevant variants (Figures 2A-C and Table S4).The leading SNP rs10877013 (p = 1.93 × 10 −9 , OR = 0.85) located in the intronic region of METTL1 (Figure 2A) was not included in the list of the final credibly causal variants.The SNP rs2014886 located in the intronic region of TSFM was predicted by SpliceAI software to affect the splicing of TSFM (NM_005726.6) with a splicing probability of 0.82 for donor gain (Figure 1B).Two variants, namely rs2291617 in the 5ʹ-UTR region of the METTL21B gene (also named as EEF1AKMT3 and FAM119B) and rs10747783 in the first exon of TSFM, were located in the DNase hypersensitivity peak, H3K4me3 peak, and H3K27ac peak as regulatoryrelevant variants (Figure 2D).Moreover, according to the information from the epigenomic datasets, rs2291617 was located in the transcription start site (TSS) of METTL21B and METTL1, whereas rs10747783 was located in the TSS of TSFM.A comparison of the chromatin state showed almost no difference between immune and nonimmune cells (Figure 2E).Similarly, rs2291617 and rs10747783 showed significant eQTL effects in diverse tissues, including the tibial nerve, whole blood, skin, and brain, for METTL1, TSFM, METTL21B, and CYP27B1 (Table S5).The GWAS association was colocalized with the eQTL effect on the expression of TSFM, METTL21B, and CYP27B1 in both the skin and tibial nerve and METTL21B in the blood (PP.H4.abf > 0.8) (Figures 1D-F and Table S6).Therefore, our fine-mapping and functional annotation analyses implicated SNPs rs2014886, rs2291617, and rs10747783; and genes METTL1, TSFM, METTL21B, and CYP27B1 as candidate causal variants and genes, respectively, in the locus chr12:57,665,085-58,665,085.
For the locus chr15:74,587,571-75,587,571, there were 13 variants in the 95% CS, including four SNPs located in the DNase hypersensitivity peak or histone acetylation or methylation peaks in peripheral blood mononuclear cells (PBMCs): the leading SNP rs117618569 (p = 7.07 × 10 , e SNX20 f Locus, the names of previously reported locus are indicated in parentheses. N signal, the number of independent signals at this locus.SNP, lead SNP of each independent signal; CHR, Chromosome; BP, position based on hg19 coordinates; A1, minor allele; A2, major allele.F_A, minor allele frequency of affected cases; F_U, minor allele frequency of unaffected controls.95% CS SNPs, the number of SNPs within the 95% credible set.NA, not applicable.intronic region of CSK, SNP rs2229729, which located in the coding region of CSK, and SNP rs35290121, which located in the upstream of CSK (Figures 3A-D).The chromatin state of the four SNPs in 833 human samples showed strong heterogeneity across different samples (Figure 3E), particularly in immune and nonimmune samples (Figure 4A).The variants rs35290121 and rs16972628 were almost exclusively present in the flanking TSS upstream region in the immune samples, SNPs rs117618569 and rs2229729 were located in the intergenic enhancer region in immune samples, whereas in the transcription region in nonimmune samples (Figures 4A  and B).Based on the information from the HiChIPdb database (a database of HiChIP regulatory interactions), only rs35290121 (SNP in the TSS region) and rs2229729 (SNP in the intergenic enhancer region) were located in the interacting sequences (looping) between chr15:75,070,000-75,075,000 and chr15:75,090,000-75,095,000 (Figure 4C).Given the region chr15:75,090,000-75,095,000 that the SNP rs2229729 located in was an enhancer region, we checked its potential targeted genes and found CSK promoter region chr15:75,070,000-75,075,000 harbored the strongest interaction than other genes (Figure S6).Taken together, these results suggest that SNPs rs35290121 and rs2229729 were candidate causal variants that may alter CSK expression in immune cells.
For the other new loci chr12:110,914,461-111,914,461, rs12229654 (p = 4.36 × 10 −8 , OR = 1.20) were the leading SNP, and its nearest protein-coding gene was MYL2 (Table 1).However, we did not find the potential causal variants in this region.
In the known leprosy loci, we found seven proteinrelevant variants with a CADD score of >15 (Figure 1A and Table S4).Four variants located in coding region, including rs2236295 in the first exon of the gene ADO (NM_032804.5),rs780668 in the fourth exon of the gene SLC29A3 (NM_018344.5), and rs181206 and rs1059491 in the IL27 (NM_145659.3) and SULT1A2 (NM_177528.2) genes, respectively (Table S7).Three other variants were located in the intronic region or near the genes SLC29A3, ZNF365, and ADO (Table S4 and S7).Moreover, signals of DNase-seq, H3K4me3 ChIP-seq, or H3K27ac ChIPseq were observed in skin or PBMCs in SNPs among the previously identified known leprosy loci, including loci that EGR2, TNFSF15, RAB32, and NOD2 located in (Figures 1C and S7A-D).We also found colocalization in four known loci and eQTL data with PP.H4.abf > 0.8, including IL18RAP and SULT1A2 in skin tissue, NOD2 in whole blood and the tibial nerve, and TNFSF15 in whole blood (Figures 1D-F and Table S6).

Gene-set, tissue, and cell-type enrichment analyses
Gene-set analysis was conducted using MAGMA software to delineate disease-related gene sets and pathways.Three Gene Ontology (GO) gene sets and four Kyoto Encyclopedia of Genes and Genomes (KEGG) terms and one REACTOME term were significantly overrepresented in our GWAS summary data as follows: GOMF_MHC class II receptor activity (p = 2.16 × 10 −8 ); GOBP_Peptide antigen assembly with MHC class II protein complex (p = 7.77 × 10 ).We also found that REACTOME_interferon gamma signaling was enriched on the boundary (p = 4.00 × 10 −4 ).The 10 most significant pathways are shown in Table S8.9][30] Based on the peaks of the chromatin marker, tissue-and cell-type enrichment analysis suggested that the DNase marker was significantly enriched in primary PBMCs (−log10 p = 3.76) and primary natural killer (NK) cells from peripheral blood (−log10 p = 2.91), while H3K4me1 and H3K4me3 markers were enriched in PMA-I-stimulated primary T helper 17 cells (−log10 p = 3.21 and 3.11, respectively) (Table S10).These results highlight the importance of immune- related tissues, cells, and pathways in the pathogenesis of leprosy.

Phenome-wide association study identified associations between leprosy and other diseases
Given that pleiotropic effects among leprosy and autoimmunity/inflammatory diseases have been uncovered, 9,10 to identify other diseases that are likely to be associated with leprosy at the variant level, we conducted a phenomewide association study (pheWAS) on each leading SNP in the GWAS Atlas database (https://atlas.ctglab.nl) that includes 4756 GWAS summary statistics.We confirmed the previous observation that the leading SNPs in the IL12B, IL18RAP, IL23R, and TNFSF15 genes showed discor-dant effects on the risk between leprosy and inflammatory bowel disease (IBD), Crohn's disease (CD), and ulcerative colitis (UC), whereas the leading SNP in the LACC1 gene showed concordant effects on the risk between leprosy and IBD and CD.We also found that the leading SNP in the IL18RAP locus showed concordant effects on the risk between leprosy and asthma, whereas the leading SNPs in the novel locus chr12:57,665,085-58,665,085 and the IL27 locus showed discordant effects on the risk between leprosy and rheumatoid arthritis, multiple sclerosis, IBD, CD, and UC (Table S11).

DISCUSSION
The identification of genetic variants associated with complex human diseases and traits have provided valuable insights into the complexities of their genetic architecture.
2][33][34] The use of a large sample size to enhance the statistical power of association analysis of variants and the use of samples genotyped using population-specific arrays and imputed based on population-specific reference panels can help to identify novel variants. 35,36Here, we performed a large GWAS analysis consisting of 7284 leprosy cases and 7738 controls and revealed three novel susceptible loci and 15 previously established locus, further expanding understanding of the genetic landscape underlying leprosy.To improve the fine-mapping resolution and help prioritization of the subsequent functional studies, we further performed a comprehensive fine-mapping analysis and 5.4% (19 out of 350) variants were identified as credibly causal variants from the 350 SNPs contained within the 95% CS.These small number of credibly causal variants provide a practical starting point for additional experiments. 37Among them, nine variants and 10 genes were highlighted by the software.What is more, the remaining 10 variants and four genes were detected by manual checking of the information from the chromatin interaction database.These results demonstrate the importance of utilizing multiple genomic annotation resources and manual annotation.
In the three newly identified loci, CSK, METTL1, METTL21B, and CYP27B1 were identified as candidate causal genes.CSK, encoding C-terminal Src kinase, inhibits T cell activation by suppressing the Src family tyrosine kinases Lck and Fyn 38 and interacting with activated CD28 to mute TCR signaling. 39The inhibition of T cell-activated IFN-γ immunity is thought to be critical for susceptibility to leprosy and infection by other mycobacteria. 40METTL1, METTL21B, and CYP27B1 involve in vitamin D-mediated antibacterial activity, which is a key signaling associated with infection. 41METTL1 and METTL21B introns bind to the vitamin D receptor (VDR) to regulate the transcription of CYP27B1 in response to the parathyroid hormone. 42CYP27B1, the 25-hydroxyvitamin D 1α-hydroxylase, leads to production of active vitamin D3 at sites of infection, which is critical for the TLR2/1induced antimicrobial activity in macrophages infected with Mycobacterium tuberculosis. 43The causal genes pinpointed in the novel loci further expand our understanding of immune-relevant mechanisms underlying leprosy based on the genetic basis.
Regarding the known loci, several genes like ADGB (C6orf103) 7 and SNX20 6 located in the linkage disequilibrium region of the top susceptible sites in our previous studies, but they were not suggested as candidate genes due to the deficiency of functional annotation.In the present study, ADGB, SNX20, ADO, and SULT1A2 were pinpointed as novel causal genes by comprehensive finemapping methods, and some of them were reported to be involved in immune responses.SNX20, encoding the sorting nexin 20 protein, plays a crucial role in the immune cell infiltration and regulates various immune molecules in the tumor microenvironment. 44We therefore hypothesize that variants of SNX20 may associated with the immune cell infiltration during the infection of M. leprae.ADO, encoding cysteamine (2-aminoethanethiol) dioxygenase, regulates the accumulation of IL-32, 45 which is critical for NOD2-mediated differentiation of monocytes to CD1b + dendritic cells. 46The identification of ADO further established the key role of NOD2-medaited signaling pathway in the susceptibility of M. leprae. 9However, the effects of ADGB and SULT1A2 in infection and immune responses remain to be clarified.The LRRK2 loci that showed suggestive association with leprosy in our previous study reached genome-wide significant level in this study. 6LRRK2 gene is highly expressed in immune cells and is biochemically linked with the processes of inflammation, autophagy, and phagocytosis in immune-related disorders such as Parkinson's disease, IBD, tuberculosis, and leprosy. 47Moreover, the known susceptibility genes identified by their reported function of immune regulation previously, like IL18RAP, IL12B, NOD2, IL27, and TNFSF15, were also highlighted in the 15 established loci by comprehensive fine-mapping methods here, which further confirmed the precision of our fine-mapping strategy and the immune-associated mechanisms in leprosy, providing a foundation for further functional exploration and interpretation of the pathogenesis.
Tissues and cell type enrichment analysis confirmed the important role of established tissue and immune cell types, including blood, primary PBMCs, NK cells, and PMA-I-stimulated primary T helper 17 cells. 3,48Intriguingly, tissue-and cell-specific gene expression analysis showed significant results in synovial fluid, which is consistent with that acid-fast bacilli were found within the synovial fluid of the patients with erythema nodosum leprosum or leprosy-related chronic arthritis. 28,29Moreover, we first clarified the PD-1 signaling and further confirmed the IFN-γ signaling 10,49 by gene-set pathway analysis.PD-1 signaling, enriched by the susceptibility genes of CSK and HLAs, is upregulated by chronic infection to inactivate TCR signaling by recruiting tyrosine kinase CSK, leading to the exhaustion of T cells and inhibition of the T cellactivated IFN-γ immunity. 50o systematically illustrate the molecular mechanisms underlying leprosy, on the basis of the functional annotation of previously identified causal genes including NOD2, RIPK2, TNFSF15, LACC1, IL12B, HLA-DR, IL23R, LRRK2, and TYK2 6,10,13,18 and the newly pinpointed candidate gene CSK, we speculated that these variants may interfere with the recognition, presentation, and clearance of M. leprae.3][54] Subsequently, antigen presentation by HLA 55 and cytokine (IL-12 and IL-23) secretion activate T cells to promote the IFN-γmediated antimicrobial response through the JAK/STAT signaling pathway. 40,56Chronic M. leprae infection upregulates the expression of the inhibitory immune receptor PD-1, which recruits the tyrosine kinase CSK 50 and inactivates TCR signaling, resulting in the exhaustion of T cells and inhibition of the T cell-mediated antimicrobial response (Figure 5).Thus, the leprosy susceptibility loci, including NOD2, RIPK2, TNFSF15, LACC1, LRRK2, IL12B, and HLA-DR in phagocytes and IL23R, TYK2, and CSK in T cells, may interfere with the synergistic antimicrobial response between phagocytes and T cells.Further functional studies are required to investigate the roles of these causal genes in leprosy.
The pleiotropic effects between leprosy and autoimmunity/inflammatory diseases have always been focused, 57 and some shared mechanisms of pathogenic recognition and response among them have been reported. 9To further fully elucidate the pleiotropic effects of the causal variants, pheWAS was applied and confirmed the significant genetic overlap among leprosy, inflammatory diseases (IBD, CD and UC), and autoimmune diseases (rheumatoid arthritis, multiple sclerosis, and asthma).The concordant or discordant effects of the leading SNPs in the loci (IL12B, IL18RAP, IL23R, TNFSF15, LACC1, METTL1, and IL27) were identified in the present study, which further highlighted the involvement of immune responses and the shared genetic fingerprint of infectious diseases and autoimmune/inflammatory diseases.Intriguingly, we newly discovered that the leading SNP in the IL18RAP locus showed concordant effects on the risk between leprosy and asthma.Asthma is a Th2 cell-mediated type 2 inflammation, lepromatous leprosy also characterized by the enrichment of Th2 cells and cytokines, 58,59 wherein IL-18 activates Th2 cells to produce IL-4 and IL-13 in the absence of IL-12 or IL-15. 60The concordant effect shows that the variant rs17027258 in the IL18RAP locus in leprosy shares a similar mechanism of Th2-mediated responses in asthma.
We acknowledged the limitation.Given that the eQTL datasets were not specific to the Asian/Chinese population, some east-Asian or Chinese-specific risk allele (e.g., rs2229729 and rs12229654) does not exist in the eQTL datasets, making it difficult to annotate the regulatory consequences of those risk locus.What is more, some causal variants may affect protein levels through their effects on translation or protein stability without any effects on mRNA levels.Use of the population-specific eQTL dataset and protein quantitative trait loci is warranted to prioritize more causal variants and genes.
Collectively, our current study with a large sample size identified three novel loci and 15 previously established loci.The biological annotation of the fine-mapped causal variants and genes highlight the importance of immune responses in the pathogenesis of leprosy.This work does not only expand our understanding of inherited variation and further characterizes the genetic architecture of leprosy, but also increases the power to construct a GWASderived risk prediction model, which will facilitate the identification of genetically susceptible persons of leprosy patients and the implementation of post exposure prophylaxis.Subsequent functional studies should be conducted to explore the biological mechanisms underlying leprosy disease risk.

Study subjects
Six GWAS datasets were included in the initial metaanalysis: four published GWASs (GWAS1−4) and two new GWASs (GWAS5 and GWAS6), resulting in a final sample size of 7284 patients with leprosy and 7738 healthy controls (Table S1).GWAS1 included 706 leprosy patients and 1223 healthy controls genotyped using the Illumina Human 610K-Quad BeadChips; all the patients and controls were from northern China and of Han Chinese descent as described in our previous study. 6WAS2 included 374 patients with leprosy and 510 healthy controls of Han Chinese descent in northern China.All participants were genotyped using the Illumina Human 660K-Quad BeadChips as described in our previous study. 9AS3 and GWAS4 included 864 cases and 853 controls of Han Chinese descent from northern China and 333 cases and 573 controls of Han Chinese descent from southern China, respectively.All participants were genotyped using the Illumina Omni Zhonghua Array as described previously. 11WAS5 and GWAS6 are two new GWAS datasets comprising 3298 leprosy patients and 3272 healthy controls from northern China and 2302 leprosy patients and 1632 controls from southern China, respectively, all of which were of Han Chinese descent.These new samples were genotyped using the Illumina Asian Screening Array BeadChip.

QC procedures for GWAS
The QC procedures for the GWAS1−4 datasets have been described previously. 6,9,11The SNPs of GWAS5 and GWAS6 were separately quality controlled on the basis of the following criteria: (1) all of the SNPs on the X, Y, and mitochondrial chromosomes; intensity-only SNPs, and copy number variations were excluded; as were (2) all the SNPs with an MAF < 1% or call rate < 95% in all cases and controls and SNPs with a p value of the Hardy-Weinberg equilibrium (HWE) < 1.0 × 10 −8 in controls.
Next, samples were removed with a call rate of <96%, one of the duplicated or related individuals (first-, second-, or third-degree familial relationships) with a lower call rate, and population outliers based on the PC analysis method.
Finally, 474,857 SNPs in 6101 samples (2991 cases and 3110 controls) of GWAS5 and 477,111 SNPs in 3485 samples (2016 cases and 1469 controls) of GWAS6 passed the SNP and sample QC filters.A total of 7284 cases and 7738 controls in the six GWAS datasets that passed the QC procedure were used for imputation and genome-wide association analysis.Table S1 provides details on the sample size and other information of each dataset.

4.3
Imputation using the ChinaMAP reference panel PLINK v2.00a2LM (https://www.cog-genomics.org/plink/2.0/) was used to convert the genotype files from plink format to vcf-4.2 format based on the human reference genome (UCSC Genome Browser hg19).The LiftoverVcf command of software gatk-4.0.12.0 61 was used to lift over the vcf files from UCSC Genome Browser reference build hg19 to hg38.bcftools (Version: 1.2, http://samtools.github.io/bcftools/bcftools.html) was used to create a separate vcf file for each chromosome.The ChinaMAP reference panel that includes 10,155 unrelated Chinese individuals can achieve high-quality imputation for genetic studies of Chinese populations, and it was used as a reference panel.The imputation was performed on the ChinaMAP imputation server by uploading all genotype vcf files to the ChinaMAP browser (www.mBiobank.com).
The imputed SNPs were included in the subsequent association analysis to confirm whether they were well imputed (r 2 > 0.8) and had an MAF ≥ 1% in all samples and SNPs without significant deviation from the HWE in the controls (p > 1.0 × 10 −5 ).Finally, 5.88 to 6.43 million qualified variants were retained in the individual GWAS dataset, and 5,276,914 SNPs that were common across the studies were used in the genetic association analysis.

Statistical analysis
The single SNP association analysis was performed separately for each GWAS dataset in SNPTEST software (v2.4.1, https://www.well.ox.ac.uk/ ∼ gav/snptest/) by using the frequentist association test of an additive model.For each dataset, significant PCs based on EigenCorr2 method calculated by EigenCorr 62 were included as covariates in the association model to account for population stratification.PC1, PC7, and PC10 were included as covariates in the association model for GWAS5.PC1, PC3, and PC13 were included as covariates in the association model for GWAS6.The PCs that were included as covariates in the association model for GWAS1-4 were as same as described previously. 6,9,11he meta-analysis of the six GWAS datasets based on an inverse variance-weighted fixed-effects model was conducted using META (v1.7). 63Manhattan plot and Q-Q plot were generated for the meta-analysis by using R software (v.3.4.3).Regional plots were generated using the online LocusZoom tool (http://locuszoom.sph.umich.edu/).
Conditional logistic regression was performed at the top signal at each identified locus by SNPTEST software to assess whether an additional independent association existed.Loci with a leading SNP whose p value was less than 5 × 10 −8 were considered significant.The threshold of p < 1 × 10 −6 was used to identify secondary signals at each significant locus.Heterogeneity across independent datasets was assessed by determining the p values from Cochran's Q statistics, and p < 0.01 was considered significant for heterogeneity.

Colocalization with eQTLs
We tested whether the 20 independent loci of leprosy coincided with the cis-eQTLs of candidate genes in different tissues.The cis-eQTLs of three tissues (whole blood, skin, and tibial nerve) were retrieved from GTEx v7. 64Coincidence of each signal was tested by pairwise colocalization analysis of the loci using the Coloc package (v5.1.0) of R software. 65Loci were defined by a ±500 kb window around the respective leading SNPs.

Fine-mapping analysis to identify credibly causal variants
The susieR package of R software (https://github.com/stephenslab/susieR) was used to identify all potential causal variants for each independent locus associated with leprosy.The imputed genotype datasets were used, and variants were extracted using a 500 kb window around the leading SNP for each locus.For the susieR method, L, which indicates the (maximum) number of causal SNPs, was set based on the conditional analysis results, where one or two causal SNPs can be assumed.The scaled prior variance was set as the default.Next, 95% CS of SNPs containing a potential causal variant within a locus were generated.The SNP with the posterior inclusion probability (PIP) within each CS was also identified.The CADD score was used to identify variants with high predicted deleteriousness, and a CADD PHRED score of >15 indicated potential deleteriousness. 66e used scores from SpliceAI (https://spliceailookup. broadinstitute.org/),a deep-learning method that predicts the effects of variants on splicing.A variant with a SpliceAI score of >0.8 was defined as deleterious.
The ChIP-seq peaks of H3K4me3 and H3K27ac and the DNase-seq data of PBMCs and skin samples were acquired from the EpiMap database. 67We then visualized the sequencing data by using the UCSC Browser (http:// genome.ucsc.edu/).The Epilogos visualization model (https://epilogos.altius.org/)based on the epigenomic datasets across 833 biosamples was used to analyze the 18chromatin states.To further characterize the functional relevance of looping interactions in the enhancer region, we evaluated the H3K27ac data based on the 5k bin of Treg, Th17, and naive T primary cells in HiChIPdb, a database of HiChIP regulatory interactions. 68All annotations were based on the human reference genome GRCh37.
−11  , OR = 1.22) and SNP rs16972628, which located in the TA B L E 1 Association statistics for 17 significant genome-wide loci outside the MHC region.

a
Implicated through eQTL colocalization evidence PP > 0.8 in blood.b Implicated through eQTL colocalization evidence PP > 0.8 in nerve.c Implicated through eQTL colocalization evidence PP > 0.8 in skin.d Protein-altering SNP or splicing in 95% credible set.e Nearest protein-coding gene to SNP. f SNPs located in important gene-regulatory regions by manual review the EpiMap dataset.Bold indicates three novel locus.F I G U R E 1 Fine-mapping summary of all credibly causal variants and genes in nine significant genome-wide loci.(A) CADD score of each credibly causal variant (red indicates CADD score > 15).(B) The Max SpliceAI score value of each credibly causal variant (red indicates SpliceAI score > 0.8).(C) Red indicates that credibly causal variants were located in important gene-regulatory regions by manual review of the histone marker signals of H3K4me3 ChIP-seq, H3K27ac ChIP-seq, and DNase-seq from PBMCs and skin from EpiMap.(D-F) The posterior probability (PP) of the GWAS and eQTL signals colocated in the same gene in the indicated tissue (red indicates PP.H4.abf > 0.8).*In this loci, rs10817678 was identified as causal variant in TNFSF15 gene by colocalization of GWAS and whole blood eQTL signals (SNP.PP.H4 > 0.8).# Nearest protein-coding gene to SNP. -, not applicable for three standards in (A)-(C).

F
I G U R E 2 Fine-mapping analysis of the chr12:57,665,085-58,665,085 locus.(A) Regional association plot shows the association results (−log10 (p value)) using online Locuszoom tool based on ±500 kb of the leading SNP.The leading variant (rs10877013) is represented by a purple diamond.Variants are colored according to their linkage disequilibrium (LD) with the leading variant by using the 1000 genomes data from East Asian groups.(B) A zoomed-in view of the association results (−log10 (p value)) for the credible set region (purple diamond = leading GWAS SNP, yellow square = consensus SNPs, red triangle = SNPs located in the epigenomic peaks).(C) Fine-mapping analysis results obtained by the susieR package, with the posterior inclusion probability (PIP) that each SNP was causal as the y-axis.(D) Histone marker signals of H3K4me3 ChIP-seq, H3K27ac ChIP-seq, and DNase-seq from peripheral blood mononuclear cells (PBMCs) (green) and skin (orange) from EpiMap and visualized with the UCSC genome browser.(E) The 18-chromatin state (ChromHMM) annotations (indicated by colors) of 833 samples from EpiMap in immune samples versus nonimmune samples and visualized with the online epilogos search tool.Gene annotations are taken from the epilogos browser.
−7  ); GOCC_MHC class II protein complex (p = 2.21 × 10 −6 ), KEGG_Leishmania infection (p = 2.52 × 10 −6 ), KEGG_Graft versus host F I G U R E 3 Fine-mapping analysis of the chr15:74,587,571-75,587,571 loci.(A) Regional association plot shows the association results (−log10 (p value)) using online Locuszoom tool based on ±500 kb of the leading SNP.The leading variant (rs117618569) is represented by a purple diamond.Variants are colored according to their LD with the leading variant by using the 1000 genomes data from East Asian groups.(B) A zoomed-in view of the association results (−log10 (p value)) for the credible set region (purple diamond = leading GWAS SNP, red triangle = SNPs located in the epigenomic peaks).(C) Fine-mapping analysis results obtained by the susieR package, with the PIP that each SNP was causal as the y-axis.(D) Histone marker signals of H3K4me3 ChIP-seq, H3K27ac ChIP-seq, and DNase-seq from PBMCs (green) and skin (orange) from EpiMap and visualized with the UCSC genome browser.(E) The 18-chromatin state (ChromHMM) annotations (indicated by colors) of 833 samples from EpiMap and visualized with the online epilogos search tool.Gene annotations are taken from the epilogos browser.disease (p = 3.96 × 10 −6 ), KEGG_Allograft rejection (p = 7.07 × 10 −5 ), KEGG_Autoimmune thyroid disease (p = 2.27 × 10 −4 ), and REACTOME_PD-1 signaling (p = 1.35 × 10 −5

F I G U R E 4
Epigenetic modification of the chr15:74,587,571-75,587,571 loci.(A) The 18-chromatin state (ChromHMM) annotations (indicated by colors) of 833 samples from EpiMap in immune samples versus nonimmune samples and visualized with the online epilogos search tool.(B) Histone marker signals of H3K27ac ChIP-seq from skin, PBMCs, T cells, NK cells, CD1c + myeloid dendritic cells, CD14 + monocyte cells, and B cells from EpiMap and visualized with the UCSC genome browser.(C) HiChIP 3D signal enrichment at the enhancer region chr15:75,090,000-75,095,000 and the CSK TSS region chr15:75,070,000-75,075,000 in Treg, Th17, and naïve T primary cells.Gene annotations are taken from the epilogos browser.

F I G U R E 5
Pathways associated with known leprosy susceptibility genes.(A) M. leprae is recognized and digested into antigens (e.g., muramyl dipeptide) by phagocytes.(B) NOD2 senses the muramyl dipeptide and recruits RIPK2 to release antibacterial cytokines such as IL-1β, IL-12 and IL-23 by NF-κB, which is mediated by TNFSF15, LACC1, and LRRK2.(C) The antigens of M. leprae are presented by HLA to activate T cells.(D) Simultaneously, secretion of IL-12 and IL-23 activates T cells to promote the IFN-γ-mediated antimicrobial response through the JAK/STAT signaling pathway.(E) Chronic M. leprae infection upregulates the expression of the inhibitory immune receptor PD-1, which recruits the tyrosine kinase CSK to inactivate TCR signaling and inhibit T cell-mediated antimicrobial response.Proteins encoded by leprosy susceptibility genes are depicted in red, and the abnormality of these molecules may interferes with the synergistic antimicrobial response between phagocytes and T cells.TCR, T cell receptor; p-STAT, phosphorylated STAT.
For multiple independently segregating SNPs in the flanking 1 Mb regions, loci were expanded to the border of the ±500 kb window around the respective signals.PP.H4.abf value shows the posterior probability of whether a shared causal variant exists in each pair of signals.SNP.PP.H4 value shows the posterior probability of each SNP was causal variant in each pair of signals.PP.H4.abf > 0.8 and SNP.PP.H4 > 0.8 were considered as successful colocalizations in one gene and in one SNP, respectively.